Multivariate Structural Statespace Components #529

jessegrabowski · 2025-06-25T14:23:55Z

This PR lifts the requirement that models built with the structural sub-module of PyMC be univariate. It's a chonky PR, so I split it into commits. Most of the files changes are changed by the first commit, which is just reorganization of files. It is safe to ignore that one.

Here are the steps I followed:

The structural module was getting pretty unweildly, so I broke it into a bunch of sub-files. This makes the code easier to find and extend. This is handled in the Reorganize structural model modlue commit
We need tools that can merge different components with potentially different (or overlapping) observed time series. This is handled by the Allow combination of component with different numbers of observed states PR. I am confident this code can be improved.
Each component needs to have new logic implemented to handle the case where there are multiple observed series. Users can optionally pass a list of names to each component as observed_state_names. Every time you add two components together, all the relevant matrices are padded and expanded, and the total observed states are created as a union between the components.

For now, we assume all states in a component follow the same parameterization. It's now also valid to add together the same component twice with different states to work around this (e.g. AutoRegressive(order=1, observed_state_names=['data_1']) + Autoregressive(order=5, observed_state_names=['data_2'])) would be a valid model with 2 observed states, but each has it's own autoregressive dynamics.

When you pass a batch of observed_state_names, e.g. LevelTrend(order=2, observed_state_names=['data_1', 'data_2']), the parameters will all be given a batch dimension, but will otherwise be the same as the base case.

More docs coming, but I tried obsessively document what in there so far.

The logic for extending the components is pretty straight-forward -- mostly copying + block_diag or concat, but there are some corner cases that need attention.

This PR should be seen as a companion to #450. Instead of vectorizing across the computation of a model, we're concatenating models. There will be cases where this is superior -- for example when you want to explicitly model latent interactions between components. But in other cases, this approach will be worse. I am interested in having both.

…ates

AlexAndorra · 2025-06-26T20:42:30Z

AutoRegressive(order=1, observed_state_names=['data_1']) + Autoregressive(order=5, observed_state_names=['data_2'])) would be a valid model with 2 observed states, but each has it's own autoregressive dynamics.

This is cool! I will review ASAP.

Note that #450 is currently blocked by what I think is a pytensor bug

pymc_extras/statespace/models/utilities.py

AlexAndorra

This is 🔥 @jessegrabowski 🤯
I just left a suggestion for what I think was a typo in the docstring. I'll merge once this is resolved, and then test all of this for our PyData tutorial -- probably this weekend.

Just a quick question: IIUC, now users can also have batched RegressionComponents, correct?

AlexAndorra

This is 🔥 @jessegrabowski 🤯
I just left a suggestion for what I think was a typo in the docstring.

Still missing this feature are:

Cycle (currently worked on by @AlexAndorra)
Seasonal
Regression (currently worked on by @Dekermanjian)

We also need to:

Make sure that there are tests that combined LevelTrend + AR + error for two observed variables with no interaction model matches two separate models for each, given the same parameters.
Make sure that pytensor ops are used everywhere for building the SS matrices (no numpy/scipy)

AlexAndorra · 2025-07-02T22:12:51Z

I think I'm done for a first review from you on the Cycle component @jessegrabowski 🍾

2. Adjusted the regression component to allow multivariate regression component specification 3. Added a notebook for quick evaluation of the adjustments and additions made

2. replaced scipy block diag with pytensor block diag 3. Added forecast to test model in multivariate ssm notebook

Added multivariate regression-component

review-notebook-app · 2025-07-05T14:51:46Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

jessegrabowski

@AlexAndorra I left comments for you

Since it's my own PR I can't request changes. It's better in future if you fork the PR branch and open a new PR into this PR, then we can do the usual review workflow on your PR and merge it into this PR when we're ready

pymc_extras/statespace/models/structural/components/cycle.py

jessegrabowski · 2025-07-06T04:24:24Z

pymc_extras/statespace/models/structural/components/cycle.py

+            design_matrix = linalg.block_diag(*[Z for _ in range(self.k_endog)])
+            self.ssm["design", :, :] = pt.as_tensor_variable(design_matrix)
+
+            R = np.eye(2)  # 2x2 identity for each cycle component


What if innovations=False, does R need to be adjusted in that case?

It was like this before your changes so don't worry about it, but it might need a separate issue.

I think it's ok? From what I understood (but speaking under your control):

R defines the structure: which states can receive innovations

Q defines the magnitude: how much innovation they receive

When Q = 0, the structure becomes irrelevant: no innovations occur

IIRC, the innovation variance affecting states is R @ Q @ R.T, which is 0 when Q = 0, regardless of R's structure.

LMK if I'm way off here.

I made it clearer in the code, with some comments in make_symbolic_graph

pymc_extras/statespace/models/structural/components/cycle.py

jessegrabowski · 2025-07-06T04:37:41Z

tests/statespace/models/structural/components/test_cycle.py

+    _assert_basic_coords_correct(cycle)
+
+
+def test_cycle_multivariate_deterministic(rng):


In this test, eval the transition, design, and selection matrices and make sure they are what they are supposed to be (check the level_trend tests for an example)

Done, although my code might be too verbose and inefficient. LMK if I can improve it

jessegrabowski · 2025-07-06T04:38:14Z

tests/statespace/models/structural/components/test_cycle.py

+        assert_allclose(ratio_0, ratio_i, atol=1e-2, rtol=1e-2)
+
+
+def test_cycle_multivariate_with_innovations_and_cycle_length(rng):


Same here, directly inspect the 3 relevant matrices (plus state_cov I suppose)

Same comment as above

jessegrabowski · 2025-07-06T04:44:41Z

@AlexAndorra @Dekermanjian I want the names of parameters in the components to be really consistent and unsurprising. So please vote on:

For the sigma parameters: name_sigma vs sigma_name
For the initial state parameters :name_initial vs initial_name vs name
For assorted greek things, like rho in Cycle: name_greek vs greek_name vs descriptive_name vs name_descriptive
For shock state names: name vs name_shock vs name_innovation

Concrete examples for (3):
a. business_cycle_rho
b. rho_business_cycle
c. dampening_business_cycle
d. business_cycle_dampening

For (4), I'm talking about the internal state names that will end up as labels for the R and Q matrices, nothing else.

Also for default names, since all the are going to depend on the names in the multivariate case, should we:

Make all the default names simpler. For example LevelTrend -> level_trend, or Cycle[s={cycle}, dampen={dampen}, innovations={innovations}] -> cycle
Keep the complex names for univariate case, but use a simple default name when its multivariate
Do away with default names, and force the user to always pass a name
As 3, but only in the multivariate case

jessegrabowski · 2025-07-06T04:46:19Z

@AlexAndorra Also please add a test adding a cycle component to another cycle component with a different observed state name. Check the resulting matrices come out as expected.

jessegrabowski · 2025-07-06T05:02:18Z

@Dekermanjian the regression component tests are failing because of this line:

        betas = self.make_and_register_variable(f"beta_{self.name}", shape=(k_endog, k_states))

You need to drop the k_endog part of the shape if k_endog == 1, because we expect a vector in that case (don't want to give all parameters in models a dummy observed state if it doesn't matter). So something like:

        betas = self.make_and_register_variable(f"beta_{self.name}", shape=(k_endog, k_states) if k_endog > 1 else (k_states, )

Also you have beta.reshape((-1, 1)).squeeze(); this is just beta.ravel().

Finally, there are some unused computations -- you had assigned k_posdef = self.k_posdef // self.k_endog, but then didn't use it, and i guess the linter removed the unused variable but left the computation. We can just remove the whole thing.

PS: Please add some multivariate regression tests (I think you already are, but just so it's on the record somewhere, here it is), including a test where you add together two regression components with different state names

Dekermanjian · 2025-07-06T11:34:05Z

@AlexAndorra @Dekermanjian I want the names of parameters in the components to be really consistent and unsurprising. So please vote on:

For the sigma parameters: name_sigma vs sigma_name

For the initial state parameters :name_initial vs initial_name vs name

For assorted greek things, like rho in Cycle: name_greek vs greek_name vs descriptive_name vs name_descriptive

For shock state names: name vs name_shock vs name_innovation

Concrete examples for (3): a. business_cycle_rho b. rho_business_cycle c. dampening_business_cycle d. business_cycle_dampening

For (4), I'm talking about the internal state names that will end up as labels for the R and Q matrices, nothing else.

Also for default names, since all the are going to depend on the names in the multivariate case, should we:

Make all the default names simpler. For example LevelTrend -> level_trend, or Cycle[s={cycle}, dampen={dampen}, innovations={innovations}] -> cycle

Keep the complex names for univariate case, but use a simple default name when its multivariate

Do away with default names, and force the user to always pass a name

As 3, but only in the multivariate case

I personally prefer these:

sigma_name
initial_name
greek_name
innovation_name

for multivariate, I personally prefer going with option 2 or option 3 if it will also apply to the univariate case. I would prefer that univariate and multivariate are as consistent as would be possible.

…e possible

jessegrabowski · 2025-07-07T09:29:14Z

@Dekermanjian I went ahead and fixed up the regression component, so please make sure to pull before you keep working.

The last steps before we merge this are to:

Address issues with Cycle
Add a test for forecasting with multiple observed
Add a test for hidden state decomposition with multiple observed

I think we can also take the rough notebook that @Dekermanjian started and turn it into a tutorial about multiple observed, in the same spirit as the existing one. We can make that a separate PR if we want, but if so, we should drop the notebook from this PR.

@AlexAndorra I need your input on the naming questions I posed above, then we can make a final decision and go with it. Once that's settled, the Cycle fixes are in, and the two tests I'm asking for above are in, I'll rebase this PR so that we have one commit per component (plus one for the utilities) and merge.

AlexAndorra · 2025-07-07T16:49:18Z

Thanks for the incredibly fast progress @jessegrabowski !! Do you need a new review from me?
I'll address your Cycle comment ASAP.

Here are my votes:

For the sigma parameters: sigma_name
For the initial state parameters initial_name
For assorted greek things, like rho in Cycle: descriptive_name (I feel very strongly against Greek names in general, which are non-descriptive at all and increase the entry cost for users, especially since the state space modeling space uses different Greek names all over the place to designate the same quantities)
For shock state names: name_shock

For default names:
2. Keep the complex names for univariate case, but use a simple default name when its multivariate

AlexAndorra · 2025-07-07T20:27:12Z

pymc_extras/statespace/models/structural/components/cycle.py

+        # selection matrix R defines structure of innovations (always identity for cycle components)
+        # when innovations=False, state cov Q=0, hence R @ Q @ R.T = 0


Added this to make the R logic clearer @jessegrabowski

AlexAndorra · 2025-07-07T20:28:10Z

pymc_extras/statespace/models/structural/components/cycle.py

+        else:
+            # explicitly set state cov to 0 when no innovations
+            self.ssm["state_cov", :, :] = pt.zeros((self.k_posdef, self.k_posdef))


Same here: added this clause to make the logic clearer @jessegrabowski . LMK is superfluous

AlexAndorra · 2025-07-07T20:31:07Z

tests/statespace/models/structural/components/test_cycle.py

+    np.testing.assert_allclose(Q, expected_Q)
+
+
+def test_add_multivariate_cycle_components_with_different_observed():


@jessegrabowski , you wrote:

Also please add a test adding a cycle component to another cycle component with a different observed state name. Check the resulting matrices come out as expected.

That's what I tried doing with this test, but LMK if that's overly complex, or even wrong, I wasn't super sure

jessegrabowski · 2025-07-08T03:34:26Z

Looks like we have consensus on sigma_name and initial_name. Let's agree that in all cases,the name is always at the end of everything and we have a predictable syntax of f'{something}_{name}'.

To that end, let's also go with {name}_shock for the shock dim name. I actually agree with @Dekermanjian that {name}_innovation is a better name, but I can really see myself getting sick of typing "innovation" over and over. Shock is shorter, and it's what we already use.

Regarding greek/descriptive, I'm really on the fence. I understand why @AlexAndorra is super anti-greek, and I've done implementations that specifically remove greek names (changing beta and gamma in BatchNorm to loc and scale here, for example).

On the other hand, we do pay a cost when inventing our own names, unless they are already widely known/used in the specific literature. In the loc and scale example above, ML people probably don't know what that means, so even though it's clearer to us, a practitioner coming in might not agree.

In actual fact, we already mix descriptive and greek names. In CycleComponent, we use cycle_dampening_factor and cycle_length, but then in RegressionComponent we use beta (not coef like in AutoRegressive and TimeSeasonlity :) ). And of course we use sigma everywhere, but I guess that's a "good greek", not one of those bad greeks that take our jobs.

My suggestion is to punt on this and open a new issue to address package-wide naming conventions. For now, let's just make sure everything is of the form f'{something}_{name}'.

AlexAndorra · 2025-07-08T21:10:15Z

I can really see myself getting sick of typing "innovation" over and over

Ha ha, literally the reasoning behind my choice.

On the other hand, we do pay a cost when inventing our own names, unless they are already widely known/used in the specific literature

Totally agree, it's just that the way I see it, the cost is internal: even if the descriptive names are only our own (which is not the case here, IIUC):

We can define them in the docstrings (in the Greek case, one has to go look on the internet and make sure it's a one-to-one correspondence
Descriptive is well... more descriptive -- and as Guido says, "explicit is better than implicit"

In CycleComponent, we use cycle_dampening_factor and cycle_length, but then in RegressionComponent we use beta (not coef like in AutoRegressive and TimeSeasonlity :) )

And I like CycleComponent much better for that :)

And of course we use sigma everywhere, but I guess that's a "good greek", not one of those bad greeks that take our jobs.

Masterpiece, nothing to add, you killed me 🤣
In all seriousness though, sigma is so universally associated to standard deviation that this is perfectly fine (and shorter to type)

jessegrabowski added 5 commits June 25, 2025 22:17

Reorganize structural model module

a70b733

Allow combination of components with different numbers of observed st…

b970a6c

…ates

Allow multiple observed in LevelTrend component

7cae487

Allow multiple observed states in measurement error component

bba8431

Allow multiple observed in AutoRegressive component

0a84576

jessegrabowski assigned zaxtax, ricardoV94 and AlexAndorra Jun 25, 2025

jessegrabowski added enhancements New feature or request major statespace labels Jun 25, 2025

AlexAndorra reviewed Jun 27, 2025

View reviewed changes

pymc_extras/statespace/models/utilities.py Outdated Show resolved Hide resolved

AlexAndorra approved these changes Jun 27, 2025

View reviewed changes

AlexAndorra self-requested a review June 28, 2025 22:11

AlexAndorra requested changes Jun 28, 2025

View reviewed changes

AlexAndorra and others added 4 commits July 1, 2025 09:37

Fix typo in docstrings

480f4fb

Allow multiple observed in Cycle component

a898eb6

Fix Cycle docstring examples

62d0750

Use pytensor block_diag for Cycle

152e962

Dekermanjian and others added 4 commits July 5, 2025 08:23

1. updated level_trend component coord/param labels

7e9bb07

2. Adjusted the regression component to allow multivariate regression component specification 3. Added a notebook for quick evaluation of the adjustments and additions made

1. removed incorrectly comitted file test_structural.py

c0a4a47

2. replaced scipy block diag with pytensor block diag 3. Added forecast to test model in multivariate ssm notebook

removed incorrectly committed file structural.py

1f3dc3a

Merge pull request #3 from Dekermanjian/multivariate-structural

530f530

Added multivariate regression-component

jessegrabowski added 2 commits July 6, 2025 11:59

Always count names to determine k_endog

0c4590e

LevelTrend state/shock names depend on component name

3c5124d

jessegrabowski commented Jul 6, 2025

View reviewed changes

Update tests to new names

b932255

jessegrabowski added 4 commits July 6, 2025 13:08

More test updates

6debd23

Delay dropping data names from states/coords until .build

fbc61a1

Remove docstring typo

85b78fe

Update autoregressive component and tests

a6327b7

jessegrabowski added 6 commits July 6, 2025 22:44

Add component name to shock state names

0b20dbc

Allow multiple observed in TimeSeasonality component

a8564b7

Allow multiple observed in FrequencySeasonality component

7581f04

Propagate static shape information in join_tensors_by_dim_labels wher…

a102e3c

…e possible

Regression component bugfix and tests

c694646

update default name in test

f584e79

jessegrabowski mentioned this pull request Jul 7, 2025

Explore sparse matrices in statespace #535

Open

AlexAndorra added 4 commits July 7, 2025 15:35

Improve cycle code with Jesse's feedback

ab98abe

Explicitly test matrices in test_cycle

08818ad

Add test addition of two Cycles with different observed names

505b7d0

Make code for state cov when no innov clearer

4fc8db2

AlexAndorra reviewed Jul 7, 2025

View reviewed changes

		_assert_basic_coords_correct(cycle)


		def test_cycle_multivariate_deterministic(rng):

		assert_allclose(ratio_0, ratio_i, atol=1e-2, rtol=1e-2)


		def test_cycle_multivariate_with_innovations_and_cycle_length(rng):

		# selection matrix R defines structure of innovations (always identity for cycle components)
		# when innovations=False, state cov Q=0, hence R @ Q @ R.T = 0

		np.testing.assert_allclose(Q, expected_Q)


		def test_add_multivariate_cycle_components_with_different_observed():

Uh oh!

Multivariate Structural Statespace Components #529

Are you sure you want to change the base?

Multivariate Structural Statespace Components #529

Uh oh!

Conversation

jessegrabowski commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlexAndorra commented Jun 26, 2025

Uh oh!

Uh oh!

AlexAndorra left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AlexAndorra left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AlexAndorra commented Jul 2, 2025

Uh oh!

review-notebook-app bot commented Jul 5, 2025

Uh oh!

jessegrabowski left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AlexAndorra Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jessegrabowski commented Jul 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jessegrabowski commented Jul 6, 2025

Uh oh!

jessegrabowski commented Jul 6, 2025

Uh oh!

Dekermanjian commented Jul 6, 2025

Uh oh!

jessegrabowski commented Jul 7, 2025 • edited by AlexAndorra Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlexAndorra commented Jul 7, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jessegrabowski commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlexAndorra commented Jul 8, 2025

Uh oh!

Uh oh!

jessegrabowski commented Jun 25, 2025 •

edited

Loading

AlexAndorra left a comment •

edited

Loading

AlexAndorra left a comment •

edited

Loading

AlexAndorra Jul 7, 2025 •

edited

Loading

jessegrabowski commented Jul 6, 2025 •

edited

Loading

jessegrabowski commented Jul 7, 2025 •

edited by AlexAndorra

Loading

jessegrabowski commented Jul 8, 2025 •

edited

Loading